AITopics | loaded dice

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Neural Information Processing SystemsFeb-12-2026, 13:28:36 GMT

Neural Information Processing Systems http://nips.cc/

estimator, objective, variance, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 13:27:05 GMT

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

any-order score function gradient estimator, bias and variance, loaded dice, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Neural Information Processing SystemsOct-2-2025, 23:28:19 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing SystemsJan-24-2025, 15:42:56 GMT

The DiCE gradient estimator [1] allows the computation of higher-order derivatives in stochastic computation graphs. This may be useful in contexts such multi-agent learning or meta-RL where the proper application of methods such as MAML require the computation of second-order derivatives. The current paper extends DiCE and derives a more general objective that allows integration of the advantage A(s_t, a_t) Q(s_t, a_t) - V(s_t) in order to control for the variance while providing unbiased estimates. The advantage can be approximated by trading off variance for bias using parametric function approximators and methods such as Generalized Advantage Estimation (GAE). Moreover, the authors propose to further control the variance of the higher-order gradients by discounting the impact past actions on the current advantage, thus limiting the range of causal dependencies. This paper is well executed: it is well written, technically sound and potentially impactful.

any-order score function gradient estimator, bias and variance, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing SystemsJan-24-2025, 15:42:45 GMT

This paper presents novel methodology in combination with automatic differentiation, that yields unbiased and low-variance estimators of derivatives at any order. It appears potentially to be widely useful, and the exposition is clear to understand. The reviewers and I seem to be in general agreement in liking the paper. Reviewer 1 wrote a thorough review touching on many aspects of the paper. The overall score was 7, and his bottom line positives were: "This paper is well executed: it is well written, technically sound and potentially impactful."

any-order score function gradient estimator, loaded dice, reinforcement learning, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 06:43:53 GMT

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

any-order score function gradient estimator, bias and variance, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

Neural Information Processing SystemsMar-18-2020, 23:48:26 GMT

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

any-order score function gradient estimator, bias and variance, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

arXiv.org Machine LearningSep-23-2019

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our objective in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

estimator, objective, variance, (15 more...)

arXiv.org Machine Learning

1909.10549

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Collaborating Authors

loaded dice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning